Search CORE

686 research outputs found

Recognition of functional regions in primary structures using a set of property patterns

Author: Bork Peer
Publication venue: Published by Elsevier B.V.
Publication date: 23/10/1989
Field of study

Abstract32 consensus patterns for a set of functional regions and structural motifs in protein sequences were constructed. The pattern definition is heuristic and based on 11 selected steric and physicochemical properties. By comparison with these patterns, it was possible to identify, without false detection, 1532 sites in 8702 protein sequences of SWISSPROT. Screening against such a pattern library offers a considerable chance to identify functional regions or structural motifs in proteins from which only the sequence is known.Pattern search; Property pattern; Primary structure; Recognitio

Elsevier - Publisher Connector

The Tenth Asia Pacific Bioinformatics Conference (APBC 2012)

Author: Bork Peer
Chen Yi-Ping Phoebe
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments

Author: Bork Peer
Suyama Mikita
Torrents David
Publication venue: Oxford University Press
Publication date: 01/07/2006
Field of study

PAL2NAL is a web server that constructs a multiple codon alignment from the corresponding aligned protein sequences. Such codon alignments can be used to evaluate the type and rate of nucleotide substitutions in coding DNA for a wide range of evolutionary analyses, such as the identification of levels of selective constraint acting on genes, or to perform DNA-based phylogenetic studies. The server takes a protein sequence alignment and the corresponding DNA sequences as input. In contrast to other existing applications, this server is able to construct codon alignments even if the input DNA sequence has mismatches with the input protein sequence, or contains untranslated regions and polyA tails. The server can also deal with frame shifts and inframe stop codons in the input models, and is thus suitable for the analysis of pseudogenes. Another distinct feature is that the user can specify a subregion of the input alignment in order to specifically analyze functional domains or exons of interest. The PAL2NAL server is available at

Crossref

PubMed Central

MDC Repository

DCD – a novel plant specific domain in proteins involved in development and programmed cell death

Author: Bork Peer
Doerks Tobias
Tenhaken Raimund
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Recognition of microbial pathogens by plants triggers the hypersensitive reaction, a common form of programmed cell death in plants. These dying cells generate signals that activate the plant immune system and alarm the neighboring cells as well as the whole plant to activate defense responses to limit the spread of the pathogen. The molecular mechanisms behind the hypersensitive reaction are largely unknown except for the recognition process of pathogens. We delineate the NRP-gene in soybean, which is specifically induced during this programmed cell death and contains a novel protein domain, which is commonly found in different plant proteins. RESULTS: The sequence analysis of the protein, encoded by the NRP-gene from soybean, led to the identification of a novel domain, which we named DCD, because it is found in plant proteins involved in development and cell death. The domain is shared by several proteins in the Arabidopsis and the rice genomes, which otherwise show a different protein architecture. Biological studies indicate a role of these proteins in phytohormone response, embryo development and programmed cell by pathogens or ozone. CONCLUSION: It is tempting to speculate, that the DCD domain mediates signaling in plant development and programmed cell death and could thus be used to identify interacting proteins to gain further molecular insights into these processes

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MDC Repository

Hochschulschriftenserver - Universität Frankfurt am Main

Extraction of Transcript Diversity from Scientific Literature

Author: Lars J Jensen
Parantu K Shah
Peer Bork
Philip Bourne
Stéphanie Boué
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term “alternative splicing” to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

FigShare

Towards standardisation of naming novel prokaryotic taxa in the age of high-throughput microbiology

Author: Bork Peer
Hildebrand Falk
Pallen Mark J
Publication venue: 'BMJ'
Publication date: 06/06/2020
Field of study

University of East Anglia digital repository

Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?

Author: Bork Peer
Harrington Eoghan D
Pallejà Albert
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. Results We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). Conclusion Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MDC Repository

The PAM domain, a multi-protein complex-associated module with an all-alpha-helix fold

Author: Bork Peer
Ciccarelli Francesca D
Izaurralde Elisa
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Multimeric protein complexes have a role in many cellular pathways and are highly interconnected with various other proteins. The characterization of their domain composition and organization provides useful information on the specific role of each region of their sequence. RESULTS: We identified a new module, the PAM domain (PCI/PINT associated module), present in single subunits of well characterized multiprotein complexes, like the regulatory lid of the 26S proteasome, the COP-9 signalosome and the Sac3-Thp1 complex. This module is an around 200 residue long domain with a predicted TPR-like all-alpha-helical fold. CONCLUSIONS: The occurrence of the PAM domain in specific subunits of multimeric protein complexes, together with the role of other all-alpha-helical folds in protein-protein interactions, suggest a function for this domain in mediating transient binding to diverse target proteins

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

King's Research Portal

MDC Repository

STRING and STITCH: known and predicted interactions between proteins and chemicals

Author: Christian von Mering
Lars J. Jensen
Manuel Stark
Michael Kuhn
Peer Bork
Samuel Chaffron
Publication venue
Publication date: 06/09/2008
Field of study

Information on protein-protein and protein-chemical interactions is essential for understanding cellular functions. The STRING and STITCH web resources integrate interaction evidence derived from pathways, automatic literature mining, primary experimental data, and genomic context. The resulting interaction networks cover 1.5 million proteins from 373 organisms and 68,000 chemicals

Nature Precedings

Annotation of the M. tuberculosis Hypothetical Orfeome: Adding Functional Information to More than Half of the Uncharacterized Proteins

Author: Bork Peer
Doerks Tobias
Minguez Pablo
van Noort Vera
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as ‘hypothetical’ implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism

Lirias

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

MDC Repository

FigShare